Propagate sharded pubsub data via replication link #307

hpatro · 2024-04-12T04:50:46Z

Issue: redis/redis#12196
Ref: redis/redis#12929

Sharded pubsub uses the cluster link for message propagation across the nodes. Cluster link has a high payload overhead and is particularly expensive if the message to be propagated is comparatively small.

As the sharded pubsub message needs to be propagated only within a shard, if the message from client is received on a primary, replication link can be used for propagation as compared to cluster link. There are two benefits we can yield from it.

Throughput will be higher compared to the initial implementation.
Message delivery guarantee will better as the message will be accumulated in client output buffer in case of disconnectio n of replication link and will be retried.

This will be inline with how message propagation is performed in cluster disabled mode.

Notes:

It's a breaking change, SPUBLISH is marked as write command and can only be sent on a primary. The command will fail and client will receive a MOVED response if published on a replica.
Sharded pubsub logic to handle message via cluster bus propogation need to exist to avoid issue(s) during upgrade(s).
Tested mix nodes in shard (Setup 1: unstable - primary, shardpubsub-replication-link - replica and Setup 2: shardpubsub-replication-link - primary, unstable - replica) subscribers were able to receive messages on primary/replica with each setup.

Benchmark:

Setup:

1 Primary (6379) + 1 Replica (6380)
No client subscription

Scenario 1: Message(s) published on primary:

Request:

src/redis-benchmark  -h 127.0.0.1 -l -p 6379 -n 10000000 -P 20 SPUBLISH hello world

Summary:

  throughput summary: 798148.25 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.184     0.264     1.199     1.391     1.455    12.159

Scenario 2: Message(s) published via cluster link (prior to this change):

Request:

src/redis-benchmark  -h 127.0.0.1 -l -p 6379 -n 10000000 -P 20 SPUBLISH hello world

Summary:

  throughput summary: 556916.94 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.732     0.528     1.487     1.663     1.727   153.471

Gain: 43% on using replication link over cluster link.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro · 2024-04-12T04:51:48Z

#76 would have been helpful here as well.

madolson · 2024-04-14T19:57:49Z

There is another important point here, which is if we moved to cluster V2 we would no longer need to maintain cluster links for pubsub.

Some context, this was approved by the previous Redis core team, but was never finished because of the license change.

@valkey-io/core-team Messaging folks here for approval. Please 👍 or 👎 with your opinion if you have one.

zuiderkwast · 2024-04-14T22:38:18Z

We'd still need to support cluster bus for unsharded pubsub right?

I fail to see the difficulty in sending data like this on a cluster v2 bus. Just maybe multiple jumps in worst case. (We could even look at real message brokers like rabbitmq for how they do this.)

Let's open up some issue about cluster v2, what we need from it and discuss different approaches.

enjoy-binbin

LGTM, btw, why didn’t we choose replication stream for shard pub/sub at the beginning?

madolson · 2024-04-15T05:05:23Z

LGTM, btw, why didn’t we choose replication stream for shard pub/sub at the beginning?

I think compatibility with the way the cluster worked was the main reasoning. Since you could send a message to a replica and read it on the primary. I don't recall why that seemed important at the time though, and all usage that I have seen of it hasn't cared about that property.

madolson · 2024-04-15T05:08:27Z

We'd still need to support cluster bus for unsharded pubsub right?

For cluster bus, we will probably need to continue supporting it. Cluster v2 will be an opt-in feature, and I think we should consider dropping support for it.

Let's open up some issue about cluster v2, what we need from it and discuss different approaches.

This is true, we could still do it. My observation about usage is that it's not really used and when it is used it's caused a lot of problems.

zuiderkwast · 2024-04-15T07:49:33Z

Cluster v2 will be an opt-in feature

This is one of the major points I don't like. I think the cluster shall do this version negotiation by itself. Two parallel solutions is very bad, compared to gradually transforming the current solution.

Btw, I don't see any problem to propagate or broadcast any data (like pubsub) in a future V2 cluster, although it may require multiple hops to reach all nodes, if not all nodes are connected to all other nodes.

soloestoy · 2024-04-15T08:51:39Z

I'm ok with it, and it is a breaking change.

hpatro · 2024-04-15T20:08:33Z

LGTM, btw, why didn’t we choose replication stream for shard pub/sub at the beginning?

Didn't realize the cluster message overhead to be this high, until a user highlighted in [BUG] In redis cluster, the traffic delivered to each node during publish exceeds 10 "multiply". redis/redis#12196
Message publishing on any node (primary/replica) seemed like a good feature to have (similar to global pubsub in cluster-enabled mode).

hpatro · 2024-04-15T20:12:12Z

We'd still need to support cluster bus for unsharded pubsub right?

Alternative solution, we could maybe alias PUBLISH/SUBSCRIBE/UNSUBSCRIBE to SPUBLISH/SSUBSCRIBE/SUNSUBSCRIBE in cluster mode to keep supporting the functionality but not use cluster links.

zuiderkwast · 2024-04-16T12:32:06Z

Alternative solution, we could maybe alias PUBLISH/SUBSCRIBE/UNSUBSCRIBE to SPUBLISH/SSUBSCRIBE/SUNSUBSCRIBE in cluster mode to keep supporting the functionality but not use cluster links.

If we start returning MOVED for these, it's a breaking change in itself. Also consider a cluster with mixed version of nodes.

I think it would be fairly easy to introduce a new message in the "legacy" cluster bus to send pubsub messages without the full Ping packet (slot bitmap, etc.). We just need a flag bit in MEET to indicate that a node supports this feature. It's low hanging fruit for better cluster performance IMO.

madolson · 2024-05-20T04:52:16Z

I think it would be fairly easy to introduce a new message in the "legacy" cluster bus to send pubsub messages without the full Ping packet (slot bitmap, etc.). We just need a flag bit in MEET to indicate that a node supports this feature. It's low hanging fruit for better cluster performance IMO.

I'm going to argue that although is possible, it doesn't seem like the right long term solution. Replication still seems like the better fit for pubsub compared to the clusterbus.

zuiderkwast · 2024-05-20T06:08:24Z

@madolson I agree replication is better, but the context of that (off topic) idea was that we still need to support unsharded pubsub.

PingXie · 2024-05-20T06:38:17Z

I'm going to argue that although is possible, it doesn't seem like the right long term solution. Replication still seems like the better fit for pubsub compared to the clusterbus.

@madolson, I assume the long-term solution here is cluster V2? If so, agreed that using replication would be fully compatible with cluster v2. that said, I think a lighter version of clusterMsg as @zuiderkwast suggested is a good thing regardless. It works for both sharded and unsharded pubsub, which we would have to continue to support. It does mean though sharded pubsub on cluster V2 would require a different solution when it could've reused the same implementation.

@soloestoy, is your call out of "breaking change" referring to the mixed cluster case? I think it would apply to both replication and the lighter clusterMsg ideas. I wonder if we could introduce a mutable server config to control this new behavior such that an admin can enable it only after they know for sure every node is running the latest version.

zuiderkwast · 2024-05-20T07:19:35Z

I wonder if we could introduce a mutable server config to control this new behavior such that an admin can enable it only after they know for sure every node is running the latest version.

Let's avoid a config if we can. What's wrong with a capability bit in the clusterMsg? I don't get why people don't seem to like it. Otherwise, a REPLCONF flag or version field could solve this.

soloestoy · 2024-05-20T07:41:09Z

@soloestoy, is your call out of "breaking change" referring to the mixed cluster case? I think it would apply to both replication and the lighter clusterMsg ideas.

The breaking change I mean is the SPUBLISH command is marked as write command.

hpatro · 2024-05-20T17:56:13Z

@soloestoy, is your call out of "breaking change" referring to the mixed cluster case? I think it would apply to both replication and the lighter clusterMsg ideas.

The breaking change I mean is the SPUBLISH command is marked as write command.

With write command association we get the redirection logic to primary builtin.
My initial change was manually handling the redirection to primary, which were a few conditional statement (did the job) and won't be a breaking change. Any preferences @valkey-io/core-team? I would lean towards a non breaking change as Pub/Sub mechanism shouldn't be restricted due to cluster coverage issue.

madolson · 2024-05-20T19:22:31Z

I would lean towards a non breaking change as Pub/Sub mechanism shouldn't be restricted due to cluster coverage issue.

The concern I had with this is then we aren't notifying clients about the redirection. Clients like the go client would look up the command, see that it's not marked as write, and then send commands to replicas before getting MOVED. We should be able to mark a client as write without including it in the write category. That is why I was suggesting a special write category. We can use the same flag to also omit from cluster convergence.

zuiderkwast · 2024-05-20T20:14:09Z

We should be able to mark a client as write without including it in the write category.

We're talking abort command flags implying ACL categories, right? I filed this a while ago: #417

soloestoy · 2024-05-23T03:20:18Z

After reading #525 , I changed my mind. Since to support multi publish, we need to introduce a new cluster message type, and the new message type should be light (without route table). So it's better to use the new message instead of replication link for sharded publish I think. Then we would not break the behavior from user side, maybe a new message type in cluster bus is an internal breaking change, but this is introduced in a new major version. I think it's not a big deal, all nodes in a cluster should use the same major version, unless during version upgrading, and lose some messages during upgrading is acceptable.

madolson · 2024-05-23T19:19:58Z

Then we would not break the behavior from user side, maybe a new message type in cluster bus is an internal breaking change, but this is introduced in a new major version. I think it's not a big deal, all nodes in a cluster should use the same major version, unless during version upgrading, and lose some messages during upgrading is acceptable.

I don't think we should lose messages during upgrade, but I think doing an upgrade is pretty easy to orchestrate. We can send a bit if we support the multi-publish functionality, and if we don't receive the bit (or don't know their state), we can send it in the old format.

madolson · 2024-05-27T14:28:41Z

We're parking this PR for the time being while we evaluate if #557 will solve the scalability issues.

Propagate sharded pubsub data via replication link

41dea06

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

madolson added the major-decision-pending Major decision pending by TSC team label Apr 14, 2024

enjoy-binbin approved these changes Apr 15, 2024

View reviewed changes

soloestoy added the breaking-change Indicates a possible backwards incompatible change label Apr 15, 2024

madolson mentioned this pull request May 22, 2024

New MPUBLISH command to publish multiple messages. #525

Open

hpatro mentioned this pull request May 24, 2024

[NEW] Pub/Sub in Cluster Mode #546

Open

zuiderkwast mentioned this pull request May 26, 2024

[NEW] Light-weight cluster bus pubsub message #557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate sharded pubsub data via replication link #307

Propagate sharded pubsub data via replication link #307

hpatro commented Apr 12, 2024

hpatro commented Apr 12, 2024

madolson commented Apr 14, 2024

zuiderkwast commented Apr 14, 2024

enjoy-binbin left a comment

madolson commented Apr 15, 2024 •

edited

madolson commented Apr 15, 2024

zuiderkwast commented Apr 15, 2024

soloestoy commented Apr 15, 2024

hpatro commented Apr 15, 2024

hpatro commented Apr 15, 2024

zuiderkwast commented Apr 16, 2024

madolson commented May 20, 2024

zuiderkwast commented May 20, 2024

PingXie commented May 20, 2024

zuiderkwast commented May 20, 2024

soloestoy commented May 20, 2024

hpatro commented May 20, 2024

madolson commented May 20, 2024

zuiderkwast commented May 20, 2024

soloestoy commented May 23, 2024

madolson commented May 23, 2024 •

edited

madolson commented May 27, 2024

Propagate sharded pubsub data via replication link #307

Are you sure you want to change the base?

Propagate sharded pubsub data via replication link #307

Conversation

hpatro commented Apr 12, 2024

Scenario 1: Message(s) published on primary:

Scenario 2: Message(s) published via cluster link (prior to this change):

hpatro commented Apr 12, 2024

madolson commented Apr 14, 2024

zuiderkwast commented Apr 14, 2024

enjoy-binbin left a comment

Choose a reason for hiding this comment

madolson commented Apr 15, 2024 • edited

madolson commented Apr 15, 2024

zuiderkwast commented Apr 15, 2024

soloestoy commented Apr 15, 2024

hpatro commented Apr 15, 2024

hpatro commented Apr 15, 2024

zuiderkwast commented Apr 16, 2024

madolson commented May 20, 2024

zuiderkwast commented May 20, 2024

PingXie commented May 20, 2024

zuiderkwast commented May 20, 2024

soloestoy commented May 20, 2024

hpatro commented May 20, 2024

madolson commented May 20, 2024

zuiderkwast commented May 20, 2024

soloestoy commented May 23, 2024

madolson commented May 23, 2024 • edited

madolson commented May 27, 2024

madolson commented Apr 15, 2024 •

edited

madolson commented May 23, 2024 •

edited